Explicit Fourth-Order Runge–Kutta Method on Intel Xeon Phi Coprocessor
نویسندگان
چکیده
منابع مشابه
Evaluation of DGEMM Implementation on Intel Xeon Phi Coprocessor
In this paper we will present a detailed study of implementing double-precision matrix-matrix multiplication (DGEMM) utilizing the Intel Xeon Phi Coprocessor. We discuss a DGEMM algorithm implementation running "natively" on the coprocessor, minimizing communication with the host CPU. We will run DGEMM across a range of matrix sizes natively as well using Intel Math Kernel Library. Our optimiza...
متن کاملEffective Barrier Synchronization on Intel Xeon Phi Coprocessor
Barriers are a fundamental synchronization primitive, underpinning the parallel execution models of many modern shared-memory parallel programming languages such as OpenMP, OpenCL or Cilk, and are one of the main challenges to scaling. State-of-the-art barrier synchronization algorithms differ in tradeoffs between critical path length, communication traffic patterns and memory footprint. In thi...
متن کاملOffload Compiler Runtime for the Intel® Xeon Phi Coprocessor
The Intel® Xeon PhiTM coprocessor platform enables offload of computation from a host processor to a coprocessor that is a fully-functional Intel® Architecture CPU. This paper presents the C/C++ and Fortran compiler offload runtime for that coprocessor. The paper addresses why offload to a coprocessor is useful, how it is specified, and what the conditions for the profitability of offload are. ...
متن کاملLattice QCD on Intel Xeon Phi
The Intel Xeon Phi architecture from Intel Corporation features parallelism at the level of many x86-based cores, multiple threads per core, and vector processing units. Lattice Quantum Chromodynamics (LQCD) is currently the only known model independent, non perturbative computational method for calculations in theory of the strong interactions, and is of importance in studies of nuclear and hi...
متن کاملDeep and Shallow Convections in Atmosphere Models on Intel® Xeon Phi™ Coprocessor Systems
Deep and shallow convection calculations occupy significant times in atmosphere models. These calculations also present significant load imbalances due to varying cloud covers over different regions of the grid. In this work, we accelerate these calculations on Intel R © Xeon PhiTM Coprocessor Systems. By employing dynamic scheduling in OpenMP, we demonstrate large reductions in load imbalance ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Parallel Programming
سال: 2016
ISSN: 0885-7458,1573-7640
DOI: 10.1007/s10766-016-0458-x